skip to main content


Search for: All records

Creators/Authors contains: "Albert, J."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Wickert, A. (Ed.)

    Abstract. Progress in better understanding and modeling Earth surface systems requires an ongoing integration of data and numerical models. Advances are currently hampered by technical barriers that inhibit finding, accessing, and executing modeling software with related datasets. We propose a design framework for Data Components, which are software packages that provide access to particular research datasets or types of data. Because they use a standard interface based on the Basic Model Interface (BMI), Data Components can function as plug-and-play components within modeling frameworks to facilitate seamless data–model integration. To illustrate the design and potential applications of Data Components and their advantages, we present several case studies in Earth surface processes analysis and modeling. The results demonstrate that the Data Component design provides a consistent and efficient way to access heterogeneous datasets from multiple sources and to seamlessly integrate them with various models. This design supports the creation of open data–model integration workflows that can be discovered, accessed, and reproduced through online data sharing platforms, which promotes data reuse and improves research transparency and reproducibility.

     
    more » « less
    Free, publicly-accessible full text available January 1, 2025
  2. Free, publicly-accessible full text available June 1, 2024
  3. Abstract Motivation

    DNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge, we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components.

    Results

    We demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using multiple sequence alignment algorithms and others that do not. We found that the choice to include multiple sequence alignment in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems.

    Availability and implementation

    The source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed, (https://dx.doi.org/10.5281/zenodo.7757762).

     
    more » « less
  4. Chemical tools to control the activities and interactions of chromatin components have broad impact on our understanding of cellular and disease processes. It is important to accurately identify their molecular effects to inform clinical efforts and interpretations of scientific studies. Chaetocin is a widely used chemical that decreases H3K9 methylation in cells. It is frequently attributed as a specific inhibitor of the histone methyltransferase activities of SUV39H1/SU(VAR)3–9, although prior observations showed chaetocin likely inhibits methyltransferase activity through covalent mechanisms involving its epipolythiodixopiperazine disulfide ‘warhead’ functionality. The continued use of chaetocin in scientific studies may derive from the net effect of reduced H3K9 methylation, irrespective of a direct or indirect mechanism. However, there may be other molecular impacts of chaetocin on SUV39H1 besides inhibition of H3K9 methylation levels that could confound the interpretation of past and future experimental studies. Here, we test a new hypothesis that chaetocin may have an additional downstream impact aside from inhibition of methyltransferase activity. Using a combination of truncation mutants, a yeast two-hybrid system, and direct in vitro binding assays, we show that the human SUV39H1 chromodomain (CD) and HP1 chromoshadow domain (CSD) directly interact. Chaetocin inhibits this binding interaction through its disulfide functionality with some specificity by covalently binding with the CD of SUV39H1, whereas the histone H3–HP1 interaction is not inhibited. Given the key role of HP1 dimers in driving a feedback cascade to recruit SUV39H1 and to establish and stabilize constitutive heterochromatin, this additional molecular consequence of chaetocin should be broadly considered. 
    more » « less
  5. Abstract Background

    Vector-borne diseases (VBDs) are important contributors to the global burden of infectious diseases due to their epidemic potential, which can result in significant population and economic impacts. Oropouche fever, caused by Oropouche virus (OROV), is an understudied zoonotic VBD febrile illness reported in Central and South America. The epidemic potential and areas of likely OROV spread remain unexplored, limiting capacities to improve epidemiological surveillance.

    Methods

    To better understand the capacity for spread of OROV, we developed spatial epidemiology models using human outbreaks as OROV transmission-locality data, coupled with high-resolution satellite-derived vegetation phenology. Data were integrated using hypervolume modeling to infer likely areas of OROV transmission and emergence across the Americas.

    Results

    Models based on one-support vector machine hypervolumes consistently predicted risk areas for OROV transmission across the tropics of Latin America despite the inclusion of different parameters such as different study areas and environmental predictors. Models estimate that up to 5 million people are at risk of exposure to OROV. Nevertheless, the limited epidemiological data available generates uncertainty in projections. For example, some outbreaks have occurred under climatic conditions outside those where most transmission events occur. The distribution models also revealed that landscape variation, expressed as vegetation loss, is linked to OROV outbreaks.

    Conclusions

    Hotspots of OROV transmission risk were detected along the tropics of South America. Vegetation loss might be a driver of Oropouche fever emergence. Modeling based on hypervolumes in spatial epidemiology might be considered an exploratory tool for analyzing data-limited emerging infectious diseases for which little understanding exists on their sylvatic cycles. OROV transmission risk maps can be used to improve surveillance, investigate OROV ecology and epidemiology, and inform early detection.

     
    more » « less
  6. Synopsis

    The increased use of imaging technology in biological research has drastically altered morphological studies in recent decades and allowed for the preservation of important collection specimens alongside detailed visualization of bony and soft-tissue structures. Despite the benefits associated with these newer imaging techniques, there remains a need for more “traditional” methods of morphological examination in many comparative studies. In this paper, we describe the costs and benefits of the various methods of visualizing, examining, and comparing morphological structures. There are significant differences not only in the costs associated with these different methods (monetary, time, equipment, and software), but also in the degree to which specimens are destroyed. We argue not for any one particular method over another in morphological studies, but instead suggest a combination of methods is useful not only for breadth of visualization, but also for the financial and time constraints often imposed on early-career research scientists.

     
    more » « less
  7. Introduction Amyotrophic Lateral Sclerosis (ALS) is a paralyzing, multifactorial neurodegenerative disease with limited therapeutics and no known cure. The study goal was to determine which pathophysiological treatment targets appear most beneficial. Methods A big data approach was used to analyze high copy SOD1 G93A experimental data. The secondary data set comprised 227 published studies and 4,296 data points. Treatments were classified by pathophysiological target: apoptosis, axonal transport, cellular chemistry, energetics, neuron excitability, inflammation, oxidative stress, proteomics, or systemic function. Outcome assessment modalities included onset delay, health status (rotarod performance, body weight, grip strength), and survival duration. Pairwise statistical analysis (two-tailed t -test with Bonferroni correction) of normalized fold change (treatment/control) assessed significant differences in treatment efficacy. Cohen’s d quantified pathophysiological treatment category effect size compared to “all” (e.g., all pathophysiological treatment categories combined). Results Inflammation treatments were best at delaying onset ( d = 0.42, p > 0.05). Oxidative stress treatments were significantly better for prolonging survival duration ( d = 0.18, p < 0.05). Excitability treatments were significantly better for prolonging overall health status ( d = 0.22, p < 0.05). However, the absolute best pathophysiological treatment category for prolonging health status varied with disease progression: oxidative stress was best for pre-onset health ( d = 0.18, p > 0.05); excitability was best for prolonging function near onset ( d = 0.34, p < 0.05); inflammation was best for prolonging post-onset function ( d = 0.24, p > 0.05); and apoptosis was best for prolonging end-stage function ( d = 0.49, p > 0.05). Finally, combination treatments simultaneously targeting multiple pathophysiological categories (e.g., polytherapy) performed significantly ( p < 0.05) better than monotherapies at end-stage. Discussion In summary, the most effective pathophysiological treatments change as function of assessment modality and disease progression. Shifting pathophysiological treatment category efficacy with disease progression supports the homeostatic instability theory of ALS disease progression. 
    more » « less
  8. As interest in DNA-based information storage grows, the costs of synthesis have been identified as a key bottleneck. A potential direction is to tune synthesis for data. Data strands tend to be composed of a small set of recurring code word sequences, and they contain longer sequences of repeated data. To exploit these properties, we propose a new framework called DINOS. DINOS consists of three key parts: (i) The first is a hierarchical strand assembly algorithm, inspired by gene assembly techniques that can assemble arbitrary data strands from a small set of primitive blocks. (ii) The assembly algorithm relies on our novel formulation for how to construct primitive blocks, spanning a variety of useful configurations from a set of code words and overhangs. Each primitive block is a code word flanked by a pair of overhangs that are created by a cyclic pairing process that keeps the number of primitive blocks small. Using these primitive blocks, any data strand of arbitrary length can be assembled, theoretically. We show a minimal system for a binary code with as few as six primitive blocks, and we generalize our processes to support an arbitrary set of overhangs and code words. (iii) We exploit our hierarchical assembly approach to identify redundant sequences and coalesce the reactions that create them to make assembly more efficient. We evaluate DINOS and describe its key characteristics. For example, the number of reactions needed to make a strand can be reduced by increasing the number of overhangs or the number of code words, but increasing the number of overhangs offers a small advantage over increasing code words while requiring substantially fewer primitive blocks. However, density is improved more by increasing the number of code words. We also find that a simple redundancy coalescing technique is able to reduce reactions by 90.6% and 41.2% on average for decompressed and compressed data, respectively, even when the smallest data fragments being assembled are 16 bits. With a simple padding heuristic that finds even more redundancy, we can further decrease reactions for the same operating point up to 91.1% and 59% for decompressed and compressed data, respectively, on average. Our approach offers greater density by up to 80% over a prior general purpose gene assembly technique. Finally, in an analysis of synthesis costs in which we make 1 GB volume using de novo synthesis versus making only the primitive blocks with de novo synthesis and otherwise assembling using DINOS, we estimate DINOS as 10 5 × cheaper than de novo synthesis. 
    more » « less